Genetic Epidemiology
○ Wiley
Preprints posted in the last 30 days, ranked by how well they match Genetic Epidemiology's content profile, based on 46 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Li, Y.; Cabral, H.; Tripodis, Y.; Ma, J.; Levy, D.; Joehanes, R.; Liu, C.; Lee, J.
Show abstract
Mediation analysis quantifies how an exposure affects an outcome through an intermediate variable. We extend mediation analysis to capture the cumulative effects of longitudinal predictors on longitudinal outcomes. Our proposed model examines how mediators transmit the effects of the current and previous exposure on the current outcome. We construct a least-squared estimator for cumulative indirect effect (CIE) and used three approaches (exact form, delta method, and bootstrap procedure) to estimate its standard error (SE). The estimator of CIE is unbiased with no unmeasured confounding and independent model errors between mediator model and outcome model at all time points, as shown in statistical inference and in simulations. While three SE estimates are numerically similar, bootstrap procedure is recommended due to its simplicity in implementation. We apply this method to Framingham Heart Study offspring cohort to assess if DNA methylation mediates the association of alcohol consumption with systolic blood pressure over two time points. We identify two CpGs (cg05130679 and cg05465916) as mediators and construct a composite DNA methylation score from 11 CpGs, which mediates for 39% of the cumulative effect. In conclusion, we propose an unbiased estimator for CIE. Future studies will investigate the missingness in mediators and outcomes.
Wang, J.; Morrison, J.
Show abstract
1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.
Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.
Show abstract
Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.
Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.
Show abstract
Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.
Gantenberg, J. R.; La Joie, R.; Heston, M. B.; Ackley, S. F.
Show abstract
Qualitative models of Alzheimers pathology often posit that amyloid accumulation follows a sigmoid curve, indicating that the rate of deposition wanes over time. Longitudinal PET data now allow us to investigate amyloid accumulation trajectories with greater detail and over longer follow-up periods. We combine inferences from simulated amyloid trajectories, empirical PET data from the Alzheimers Disease Neuroimaging Initiative (ADNI), and the sampled iterative local approximation algorithm (SILA) to assess whether amyloid accumulation reaches a physiologic ceiling. We find that SILA reliably detects a ceiling, when present, across a range of simulated scenarios that impose a sigmoid shape. When fit to empirical data from ADNI, however, SILA does not appear to indicate the presence of a ceiling. Thus, we conclude that amyloid trajectories may not reach a physiologic ceiling during the stages of Alzheimers disease typically observed while patients remain under follow-up in cohort studies. Fits using SILA indicate that illustrative models of biomarker cascades, while useful tools for conceptualizing and interrogating pathologic processes, may not represent the shapes of amyloid trajectories accurately. Summary for General PublicAmyloid, a protein implicated in Alzheimers disease, is thought to reach a plateau in the brain, but methods that estimate how amyloid changes over time suggest it grows unabated. Gantenberg et al. use one such method and simulations to argue that amyloid does not reach a plateau during the typical course of Alzheimers.
Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.
Show abstract
Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.
Zheng, W.; Liu, T.; Xu, L.; Xie, Y.; Jing, Y.; Shao, H.; Zhao, H.
Show abstract
Phenome-wide association studies (PheWAS) enable systematic exploration of relationships between genetic variants and clinical phenotypes derived from electronic health records (EHRs). Conventional regression-based PheWAS treats phenotypes separately and relies on binary phenotype representations, which limits statistical power for rare variants and rare phenotypes and reduces the ability to detect associations with phenotypes that are distributed across clinical codes. To address this limitation, we first developed EmbedPheScan, a phenotype embedding-based prioritization framework that summarizes the phenotypic profiles of rare loss-of-function variant carriers in a continuous embedding space. We then proposed EA-PheWAS by combining these embedding-derived signals with conventional regression-based PheWAS results using the aggregated Cauchy association test. Using the UK Biobank whole-exome sequencing and EHR data, we show that the proposed methods maintain appropriate false-positive control. We then performed genome-wide phenome scans across all genes and across biologically defined gene classes to evaluate EA-PheWAS relative to conventional PheWAS and EmbedPheScan, consistently finding that EA-PheWAS outperformed the other two methods. We illustrate the utility of EA-PheWAS focusing on four genes representing distinct scenarios, including strong-effect disease genes (PKD1, PKD2), genes with large numbers of rare LoF carriers (NF1), and genes with extremely sparse carrier counts (FBN1).
O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.
Show abstract
Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.
Testa, L.; Klei, L.; Rengle, A.; Yocum, A.; Lewis, D. A.; Devlin, B.; Roeder, K.; MacDonald, M. L.
Show abstract
A single gene can encode multiple versions of a protein, dubbed isoforms, with varying functionality. Cellular control of isoform abundances is critical for multiple aspects of biology and is only partially regulated by transcript levels. While long-read sequencing facilitates transcript quantification, quantifying the resulting protein isoforms on a large scale is a major challenge, complicating biological interpretation of transcript alterations. Standard "bottom up" mass spectrometry can assess only short portions of isoforms called peptides, and these peptides often map onto more than one isoform. We introduce PAQu, a novel Bayesian method that leverages multiomic information from the peptidome and transcriptome to provide accurate estimates of isoform abundance even when peptide mapping is ambiguous. PAQu offers several advantages over existing methods in a unified framework. It provides uncertainty quantification, integrates multiomic information for improved accuracy, and provides a rigorous framework for hypothesis testing. Extensive simulations show that PAQu consistently outperforms competing methods in detecting differentially expressed protein isoforms and estimating their abundances. We use PAQu to investigate differences in isoform abundance levels between people with schizophrenia and control subjects, confirming a long held hypothesis that levels of the C4A isoform of Complement Component 4 are increased in schizophrenia while C4B is not. These results demonstrate that PAQu can identify significant variations in isoform abundance levels not previously possible.
Ihejirika, S. A.; Stephen, E.; Ye, K.
Show abstract
Gene-environment interactions (GEI) contribute to circulating polyunsaturated fatty acid (PUFA) and monounsaturated fatty acid (MUFA) profiles. GEI may partly explain differences in trait variance across genotype groups. To identify GEI for circulating unsaturated fatty acids, we adopted a two-stage strategy. First, we detected quantitative trait loci associated with trait variance (vQTLs). Second, we tested these vQTLs for interaction with fish oil supplements (FOS). We performed genome-wide vQTL screens for 14 plasma PUFA and MUFA phenotypes in a UK Biobank subset of 200,478 participants. At the genome-wide significance threshold (p < 5.0 x 10-8), we identified 172 vQTL-trait pairs across all 14 traits, and 16 of these vQTLs had no marginal genetic effect on the corresponding trait. We found 46 non-overlapping loci across all phenotypes, with an average of 12 vQTLs per trait. Omega-6% and PUFA% had the most independent vQTLs (N = 24) while DHA% and Omega-3% had the least (N = 1 and 2, respectively). For each of the 172 vQTL-trait pairs, we tested the interaction effect of the vQTL with FOS on the corresponding trait. We found six significant interaction signals in DHA, DHA%, Omega-3, Omega-3%, LA, and Omega-6/Omega-3 ratio around the FADS1/2, ZPR1, and SUGP1/TM6SF2 genes. Our results provide a comprehensive resource of vQTLs and gene-FOS interactions shaping the circulating levels of unsaturated fatty acids.
Tomasi, J.; Xu, H.; Zhang, L.; Carey, C. E.; Schoenberger, M.; Yates, D. P.; Casas, J.
Show abstract
Background: Elevated lipoprotein(a) [Lp(a)] is a known risk factor for several cardiovascular-related diseases established from multiple genetic and observational studies. However, the underlying mechanisms mediating the effects of Lp(a) levels on cardiovascular disease risk and major adverse cardiovascular events (MACE) are unclear. The aim of this study was to identify proteins downstream of Lp(a) using mendelian randomization (MR) - a genetic causal inference approach. Methods: A two-sample MR was performed by initially identifying Lp(a) genetic instruments based on data from genome wide association studies (GWAS) of Lp(a) blood concentrations. These instruments were then tested for association with proteins from proteomic pQTL data (Olink from UK Biobank, 2940 proteins and SomaScan from deCODE, 4907 proteins). Results: A total of 521 proteins associated with Lp(a) were identified. Using pathway enrichment analysis, the following MACE-relevant pathways were identified comprising a total of 91 Lp(a) downstream proteins: oxidized phospholipid-related, chemotaxis of immune cells and endothelial cell activation, pro-inflammatory monocyte activation, neutrophil activity, coagulation, and lipid metabolism. Conclusion: The results suggest that the influence of Lp(a) treatments is primarily through modifying inflammation rather than lipid-lowering, thus providing insight into the mechanistic framework which mediates the effects of elevated Lp(a) on atherosclerotic cardiovascular disease.
Petrin, A. L.; Keen, H. L.; Dunlay, L.; Xie, X. J.; Zeng, E.; Butali, A.; Wilcox, A.; Marazita, M. L.; Murray, J. C.; Moreno-Uribe, L.
Show abstract
Introduction: Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is a common congenital malformation with complex etiology involving both genetic and environmental factors. Epigenetic mechanisms may mediate environmental contributions, but separating genetic from environmental effects remains challenging. Methods: We present an epigenome-wide association study with 32 monozygotic and 22 dizygotic twin pairs discordant for NSCL/P on blood and saliva samples. Differential methylation analysis was conducted using linear models to identify CpG sites showing significant methylation differences between affected and unaffected twins followed by functional annotation and pathway enrichment analysis. Results: The top-ranked finding is a differentially methylated region comprising two CpG sites at the CYP26A1 locus, cg12110262 (P = 3.21x10-7) and cg15055355 (P = 1.39x10-3). CYP26A1 is essential for retinoic acid catabolism and craniofacial patterning. The chromatin regulator ANKRD11, which causes KBG syndrome featuring cleft palate was the second best hit. Differentially methylated CpG sites showed significant enrichment in craniofacial enhancers and overlap with multiple GWAS-validated cleft genes including VAX1, PVRL1, SMAD3, and PRDM16. Conclusions: Our findings implicate retinoic acid signaling and chromatin regulation in NSCL/P etiology and demonstrate the value of discordant twin designs for distinguishing environmental from genetic epigenetic contributions to complex malformations.
Alonso-Gonzalez, A.; Jaspez, D.; Lorenzo-Salazar, J. M.; Delgado, A.; Quintero-Bacallado, A.; Ma, S.-F.; Strickland, E.; Mychaleckyj, J.; Kim, J. S.; Huang, Y.; Adegunsoye, A.; Oldham, J. M.; Maher, T. M.; Guillen-Guio, B.; Wain, L. V.; Allen, R. J.; Saini, G.; Jenkins, R. G.; Molina-Molina, M.; Zhang, D.; Kim Garcia, C.; Martinez, F. J.; Noth, I.; Flores, C.
Show abstract
Background: Idiopathic pulmonary fibrosis (IPF) is a rare disease with a poor prognosis. Disease risk involves rare and common genetic variants. However, an inverse association have been described between them. Accordingly, IPF patients with a higher polygenic risk score (PRS) for IPF are less likely to carry rare deleterious variants and vice versa. Here, we evaluate weather PRS of IPF could serve as an additional criterion to patient prioritisation for rare variant discovery. Methods: We identified carriers based on the presence of rare qualifying variants (QVs) in genes linked to monogenic forms of pulmonary fibrosis in 888 IPF patients from the Pulmonary Fibrosis Foundation Patient Registry (PFF-PR). Genome-wide association study (GWAS) summary statistics from independent cohorts were used to construct a whole-genome PRS (WG-PRS) using a clumping and thresholding method (C+T) and a Bayesian method (SBayesRC). PRS were also derived from 19 known common sentinel IPF variants (Sentinel-PRS). Logistic regression models were used to evaluate associations between PRS and carrier status. Discriminatory performance was evaluated using area under the curve (AUC) analysis, and comparisons were made with DeLong test. Validation was performed in 472 IPF individuals from the UK PROFILE cohort. Results: IPF-PRS were strongly associated with the QVs carrier status: Odds Ratio [OR] 0.65 (95% Confidence Interval [CI] 0.53-0.79) for WG-PRSC+T, OR 0.71 (95% CI 0.59-0.86) for WG-PRSSBayesRC, and OR 0.77 (95% CI 0.63-0.94) for Sentinel-PRS. Adding WG-PRS to the patient personal clinical history improved the prediction of QVs carriers: AUC=0.62 for the clinical model, AUC=0.68 for WG-PRSC+T (DeLong test, p=9.54x10-4) and AUC=0.66 for WG-PRSSBayesRC (DeLong test, p=0.02). Adding of IPF-PRS to clinical variables correctly reclassified 22.8% of carriers when using WG-PRSC+T, 20.8% when using Sentinel-PRS, and 16.7% for WG-PRSSBayesRC. WG-PRSSBayesRC and the Sentinel-PRS also demonstrated improved prediction of QVs carriers in telomere-related genes in PROFILE. Conclusions: Incorporating IPF-PRS into a model based on the patient clinical history improves the identification of QVs carriers. Although the overall discriminatory power was moderate, these findings raise de the possibility of using WG-PRS as useful criterion for rare variant discovery in patients with IPF and enhance decision-making.
Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.
Show abstract
Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.
Kornilov, S. A.
Show abstract
Shenhar et al. (2026) report 50% "intrinsic" lifespan heritability after calibrating a one-component correlated-frailty survival model to Scandinavian twin lifespans. Their framework is mathematically coherent, but the intrinsic component is not identified if heritable, mortality-relevant extrinsic susceptibility is omitted at calibration. We show that one-component calibration absorbs omitted familial extrinsic structure into the intrinsic frailty scale parameter{sigma}{theta} , and that this variance absorption is visible through separate diagnostics (1) Variance absorption. Under misspecification,{sigma}{theta} is inflated by +22.1% (95% CI: 21.5-22.7%), corresponding to +49% inflation in [Formula]. Falconer h2 is downstream of calibration and inherits a +9.2 pp bias (95% CI: 8.7-9.7). The{sigma}{theta} inflation is model-general: +22% (GM), +18% (MGG), +14% (SR); any dependence summary that is strictly increasing in{sigma}{theta} inherits this inflation, so Falconer h2 is one affected downstream quantity among many (Corollary B3). (2) Structural fingerprint. In the joint twin survival surface S(t1, t2), misspecification produces systematic dependence errors (ISE 48x that of the recovery model). Conditional twin dependence is inflated at all ages, peaking at age 80 ({Delta}r = 0.048). (3) Specificity. The bias requires an omitted component that is both heritable and mortality-relevant. Three negative controls, a boundary check ({rho} = 0), and a two-component recovery refit ({sigma}{theta} restored to within -3.2%) establish specificity. ACE decomposition yields C {approx} 0 throughout: the omitted extrinsic component loads onto A (because it is shared 1.0/0.5 in MZ/DZ), so switching summary statistics does not restore identification. (4) Sensitivity and falsifiability. Over an empirically anchored regime ({sigma}{gamma} [isin] [0.30, 0.65],{rho} [isin] [0.20, 0.50]), Falconer bias ranges from +2.8 to +18.9 pp (mean 9 pp). If{rho} is sufficiently negative, the bias reverses sign in all three model families (Corollary B4). A full-likelihood robustness check shows that this upward pull is partly structural and partly estimator-specific: in the same misspecified one-component model, ML still inflates{sigma}{theta} (+3%), whereas matching only rMZ inflates it much more (+21%). These results do not resolve true intrinsic heritability but establish that Shenhars 50% estimate carries a structured, model-general upward bias originating in the fitted latent variance{sigma}{theta} .
Wang, Y.; Tuftin, B.; Raffield, L. M.; Hidalgo, B.; Kerns, S. L.; DeWan, A. T.; Leal, S. M.; Auer, P.
Show abstract
Individuals with admixed ancestry comprise a significant proportion of populations of the Americas. Statistical methods have been developed to specifically leverage local ancestry inference to enhance the power and interpretability of genome-wide association studies in admixed populations. However, no such methods currently exist to test for rare-variant aggregate associations. Here we present LANTERN (Leveraging local ANcestry Tracts to Enhance Rare variaNt aggregate associations), a method that infers the alleles that lie on each ancestral haplotype and conducts rare-variant aggregate association testing in a generalized linear mixed model framework. Through simulation studies we demonstrated that LANTERN achieves proper control of Type 1 error while boosting power to detect associations when causal alleles predominately lie on one ancestral haplotype. Using data from a cohort of African American participants from the Jackson Heart Study, LANTERN identified two genes known to be involved in red-blood cell (RBC) biology when local ancestry information was incorporated. Specifically, a burden of rare alleles on European ancestral haplotypes in EPO was associated with both hemoglobin levels (HGB) and RBC counts, whereas a burden of rare alleles on African ancestral haplotypes in EPB42 was associated with HGB and RBC. In summary, LANTERN (i) allows for the identification of ancestry-specific rare-variant associations; and (ii) enhances rare-variant association signals compared to an analysis that ignores local ancestry. LANTERN is implemented in R and is freely available on GitHub.
Zhang, X.; Joehanes, R.; Ma, J.; Pain, O.; Levy, D.; Westerman, K.; Bell, J. T.
Show abstract
Body fat distribution is a strong predictor of cardiometabolic disease risk. Gene-environment and gene-gene interactions can affect body fat distribution, resulting in differential phenotypic variance across genotype groups that can be detected through variance quantitative trait loci (vQTLs). Using UK Biobank MRI data in conjunction with genetic data, we explored evidence for vQTLs for body fat distribution phenotypes aiming to uncover potential genetic interactions. We identified three vQTLs for liver fat distribution, including rs738408 (PNPLA3), rs4293458 (APOE), and rs58542926 (TM6SF2), and one vQTL region (FTO) for abdominal subcutaneous adipose tissue. To dissect putative gene-environment and gene-gene interactions underlying these signals, we identified multiple vQTL-environment interactions and one epistatic effect (rs58542926*rs429358) for liver fat. The vQTLs and interaction results were validated in multiple UK Biobank and external replication cohort datasets (Framingham Heart Study, All of Us, and TwinsUK), showing replication of the three liver vQTLs with the greatest reproducibility for vQTL rs738408. Our findings uncover vQTLs and underlying interaction effects on body fat distribution, especially liver fat, that may be useful for the development of precision medicine approaches.
Sato, Y.; Hamazaki, K.
Show abstract
Individual phenotypes often depend on the genotypes of other individuals within a group. These phenomena are termed indirect genetic effects (IGEs) and have been distinguished from direct genetic effects (DGEs) using quantitative genetic models. Recent studies have utilized high-resolution polymorphism data to enable genomic prediction (GP) and genome-wide association study (GWAS) of IGEs, but unified methods remain limited. Here we integrate polygenic and oligogenic IGEs using a multi-kernel mixed model incorporating two random effects with a single covariance parameter. Underlying this implementation, the Ising model of ferromagnetics enabled us to simplify locus-wise and background IGEs for GWAS and GP, respectively. Our simulations demonstrated that, while the previous and present models exhibited similar performance, the present model can infer a trade-off between DGEs and IGEs. By applying this method to three species of woody plants, we found evidence for intergenotypic competition in aspen and apple trees, but limited evidence in climbing grapevines. Based on GWAS, we also detected significant variants associated with the competitive IGEs on the apple trunk growth. Our study offers a flexible implementation for GWAS/GP of IGEs, thereby providing an effective tool to dissect the genetic architecture of group performance.
Iotchkova, V.; Weale, M. E.
Show abstract
Multi-trait colocalisation is a vital tool to make sense of the large amounts of GWAS data available on platforms like Mystra. It identifies genetic association signals that cluster together, allowing us to infer which gene might be causal for a trait and also which constellation of biological effects might be affected by modulating that gene. Multi-trait colocalisation is a challenging computational problem. Here, we introduce MystraColoc, a Bayesian algorithm for multi-trait colocalisation that works across hundreds or even thousands of GWAS datasets. We illustrate its power both via a worked example at the HDAC9-TWIST1 locus, and via a simulation study that demonstrates its superior clustering performance compared to alternative methods.
Han, G.; Yuan, A.; Oware, K. D.; Wright, F.; Carroll, R. J.; Smith, M.; Ory, M. G.; Yan, D.; Wang, W.; Sun, Z.; Dai, Q.; Allen, C.; Dang, A.; Liu, Y.
Show abstract
Alzheimers disease genomics and other high-dimensional omics studies demand powerful statistical methods, yet Bayesian inference remains underutilized despite its advantages in small-sample settings, owing to the prohibitive cost of eliciting reliable priors across thousands or millions of parameters. We propose an AI-assisted Bayesian-frequentist hybrid inference framework that couples large language model based prior elicitation with the hybrid inference theory of Yuan (2009). ChatGPT-4o is queried via a standardized prompt to assess the strength of evidence linking each gene to a disease of interest, and the response is mapped to an informative normal prior via a standardized effect-size calibration. Parameters for covariates of secondary interest are treated as frequentist parameters, preserving efficiency and avoiding sensitivity to mis-specified priors. We derive closed-form hybrid estimators under uniform and conjugate normal priors in linear models, establish their asymptotic equivalence to the frequentist and full Bayes estimators, and show in simulations that hybrid inference using unconditional variance estimation leads to high statistical power while accurately controlling the Type I error rate. Applied to single-cell RNA sequencing data from the ROSMAP cohort for Alzheimers disease as an example, the framework identifies biologically coherent pathways (such as gamma-secretase pathways) previously undetected. The proposed framework offers a principled and computationally scalable approach to genome-wide Bayesian analysis, with potential for broad application across omics platforms and disease settings.